我们开发了一种自主导航算法,用于在二维环境中运行的机器人杂乱,其具有任意凸形的障碍物。所提出的导航方法依赖于混合反馈,以保证机器人对预定目标位置的全局渐近稳定,同时确保无障碍工作空间的前向不变性。主要思想在于基于机器人相对于最近障碍的接近设计,在移动到目标模式和障碍物避免模式之间设计适当的切换策略。当机器人初始化远离障碍物的边界时,所提出的混合控制器产生连续速度输入轨迹。最后,我们为所提出的混合控制器的基于传感器的实现提供了一种算法过程,并通过一些仿真结果验证其有效性。
translated by 谷歌翻译
Multimodal deep learning has been used to predict clinical endpoints and diagnoses from clinical routine data. However, these models suffer from scaling issues: they have to learn pairwise interactions between each piece of information in each data type, thereby escalating model complexity beyond manageable scales. This has so far precluded a widespread use of multimodal deep learning. Here, we present a new technical approach of "learnable synergies", in which the model only selects relevant interactions between data modalities and keeps an "internal memory" of relevant data. Our approach is easily scalable and naturally adapts to multimodal data inputs from clinical routine. We demonstrate this approach on three large multimodal datasets from radiology and ophthalmology and show that it outperforms state-of-the-art models in clinically relevant diagnosis tasks. Our new approach is transferable and will allow the application of multimodal deep learning to a broad set of clinically relevant problems.
translated by 谷歌翻译
The success of Deep Learning applications critically depends on the quality and scale of the underlying training data. Generative adversarial networks (GANs) can generate arbitrary large datasets, but diversity and fidelity are limited, which has recently been addressed by denoising diffusion probabilistic models (DDPMs) whose superiority has been demonstrated on natural images. In this study, we propose Medfusion, a conditional latent DDPM for medical images. We compare our DDPM-based model against GAN-based models, which constitute the current state-of-the-art in the medical domain. Medfusion was trained and compared with (i) StyleGan-3 on n=101,442 images from the AIROGS challenge dataset to generate fundoscopies with and without glaucoma, (ii) ProGAN on n=191,027 from the CheXpert dataset to generate radiographs with and without cardiomegaly and (iii) wGAN on n=19,557 images from the CRCMS dataset to generate histopathological images with and without microsatellite stability. In the AIROGS, CRMCS, and CheXpert datasets, Medfusion achieved lower (=better) FID than the GANs (11.63 versus 20.43, 30.03 versus 49.26, and 17.28 versus 84.31). Also, fidelity (precision) and diversity (recall) were higher (=better) for Medfusion in all three datasets. Our study shows that DDPM are a superior alternative to GANs for image synthesis in the medical domain.
translated by 谷歌翻译
Recent advances in computer vision have shown promising results in image generation. Diffusion probabilistic models in particular have generated realistic images from textual input, as demonstrated by DALL-E 2, Imagen and Stable Diffusion. However, their use in medicine, where image data typically comprises three-dimensional volumes, has not been systematically evaluated. Synthetic images may play a crucial role in privacy preserving artificial intelligence and can also be used to augment small datasets. Here we show that diffusion probabilistic models can synthesize high quality medical imaging data, which we show for Magnetic Resonance Images (MRI) and Computed Tomography (CT) images. We provide quantitative measurements of their performance through a reader study with two medical experts who rated the quality of the synthesized images in three categories: Realistic image appearance, anatomical correctness and consistency between slices. Furthermore, we demonstrate that synthetic images can be used in a self-supervised pre-training and improve the performance of breast segmentation models when data is scarce (dice score 0.91 vs. 0.95 without vs. with synthetic data).
translated by 谷歌翻译
基于各种非负矩阵分解(NMF)方法为成本函数添加了新术语,以使模型适应特定任务,例如聚类或保留减少空间中的某些结构属性(例如,局部不变性)。附加的术语主要由高参数加权,以控制整体公式的平衡,以指导优化过程实现目标。结果是一种参数化的NMF方法。但是,NMF方法采用了无监督的方法来估计分解矩阵。因此,不能保证使用新的特征执行预测(例如分类)的能力。这项工作的目的是设计一个进化框架,以学习参数化NMF的超参数,并以监督的方式估算分解矩阵,以更适合分类问题。此外,我们声称,将基于NMF的算法分别应用于不同的类对,而不是将其应用于整个数据集,从而提高了矩阵分解过程的有效性。这导致训练具有不同平衡参数值的多个参数化的NMF算法。采用了交叉验证组合学习框架,并使用遗传算法来识别最佳参数值集。我们对真实和合成数据集进行的实验证明了所提出的方法的有效性。
translated by 谷歌翻译
本文中描述的模型属于专为数据表示和降低尺寸而设计的非负矩阵分解方法的家族。除了保留数据阳性属性外,它还旨在在矩阵分解过程中保留数据结构。这个想法是在NMF成本函数中添加一个惩罚术语,以在原始数据点和转换数据点的成对相似性矩阵之间实现比例关系。新模型的解决方案涉及为系数矩阵得出新的参数化更新方案,这使得在用于群集和分类时可以提高还原数据的质量。将所提出的聚类算法与某些现有的基于NMF的算法以及应用于某些现实生活数据集时的某些基于多种学习的算法进行了比较。获得的结果显示了所提出的算法的有效性。
translated by 谷歌翻译
在本文中,我们提出了GLOWVC:一种基于多语言的多语言流程模型,用于与语言无关的语音转换。我们建立在Glow-TTS上,该架构提供了一个架构,该体系结构可以在训练过程中使用语言特征,而无需将其用于VC推理。我们考虑了我们的模型的两个版本:glowVC条件和glowVC阐释。 GLOWVC条件模拟具有扬声器条件流的旋光图的分布,并将Mel-Spectrogragron空间置于内容和音高相关的尺寸中,而GlowVC-Plapic-Plapic-Plocific-Plocific opplicit over opplicit of the SughtliciT模型,无条件的流量和删除空间表示空间 - 内容 - 音调和与扬声器相关的维度。我们根据可见语言和看不见的语言的内部和跨语性转换来评估我们的模型,说话者的相似性和自然性。 GlowVC在清晰度方面的模型大大优于AutoVC基线,同时在语言内VC中获得了高扬声器的相似性,并且在跨语言环境中稍差。此外,我们证明了glowvc-suplicic在自然性方面超过了glowvc条件和自动vc。
translated by 谷歌翻译
由于监督学习模型的培训中的高成本和数据限制,自我监督学习(SSL)最近引起了很多关注。 SSL中的当前范式是利用输入空间的数据增强来创建相同图像的不同视图并训练模型以最大化相似图像之间的表示,并最大程度地减少它们的不同图像。尽管这种方法实现了最新的(SOTA),但仍会实现各种下游任务,但它仍然有机会调查潜在的空间扩展。本文提出了Trimix,这是SSL的一种新颖概念,该概念通过数据的线性插值生成虚拟嵌入,从而为模型提供了新的表示。我们的策略着重于训练模型,以从虚拟的嵌入中提取原始嵌入,从而更好地表示学习。此外,我们提出了一个自称术语,可以提高虚拟嵌入和实际嵌入之间的一致性。我们在八个基准数据集上验证了Trimix,这些数据集由天然和医学图像组成,提高了2.71%和0.41%,比两种数据类型的第二好的模型好。此外,我们的方法表现优于半监督学习中的当前方法,尤其是在低数据制度中。此外,我们的预训练模型显示出更好的传输到其他数据集。
translated by 谷歌翻译
With the advancements in deep learning (DL) and an increasing interest in data-driven speech processing methods, there is a major challenge in accessing pathological speech data. Public challenge data offers a potential remedy for this but may expose patient health information by re-identification attacks. Therefore, we investigate in this study whether or not pathological speech is more vulnerable to such re-identification than healthy speech. Our study is the first large-scale investigation on the effects of different speech pathology on automatic speaker verification (ASV) using a real-world pathological speech corpus of more than 2,000 test subjects with various speech and voice disorders from different ages. Utilizing a DL-based ASV method, we obtained a mean equal error rate (EER) of 0.89% with a standard deviation of 0.06%, which is a factor of three lower than comparable healthy speech databases. We further perform detailed analyses of external influencing factors on ASV such as age, pathology, recording environment, utterance length, and intelligibility, to explore their respective effect. Our experiments indicate that some types of speech pathology, in particular dysphonia, regardless of speech intelligibility, are more vulnerable to a breach of privacy compared to healthy speech. We also observe that the effect of pathology lies in the range of other factors, such as age, microphone, and recording environment.
translated by 谷歌翻译
在学术界,抄袭肯定不是一个新兴的关注,但它随着互联网的普及和对全球内容来源的易于访问而变得更大的程度,使人类干预不足。尽管如此,由于计算机辅助抄袭检测,抄袭远远远非是一个未被解除的问题,目前是一个有效的研究领域,该研究落在信息检索(IR)和自然语言处理(NLP)领域。许多软件解决方案有助于满足这项任务,本文概述了用于阿拉伯语,法国和英语学术和教育环境的抄袭检测系统。比较在八个系统之间持有,并在检测不同来源的三个混淆水平的特征,可用性,技术方面以及它们的性能之间进行:逐字,释义和跨语言抄袭。在本研究的背景下也进行了对技术形式的抄袭技术形式的关注检查。此外,还提供了对不同作者提出的抄袭类型和分类的调查。
translated by 谷歌翻译